Conversation
|
Can we add some tests like we have for activity to verify the parameter is properly wired through and respected |
We have coverage in NexusWorkflowTest.java. |
| Assert.assertTrue("OPERATION_TIMEOUT should be positive", operationTimeoutMs > 0); | ||
|
|
||
| // Sleep longer than schedule-to-start timeout to trigger the timeout | ||
| Thread.sleep(2000); |
There was a problem hiding this comment.
Should be able to use testWorkflowRule.sleep here to avoid waiting real time in the time skipping test server, but I would prefer the test not use a sleep and just wait for the condition to be true. Should be possible to wait for the operation to fail by looking at the workflow describe output no?
There was a problem hiding this comment.
I think the sleep is reasonable here and I don't expect it to be flaky. If you point me to an example of how to use some "wait for condition" or "eventually" in the Java tests, I can adapt that.
NexusWorkflowTest.java doesn't use the Java SDK so it is not providing coverage of the SDK |
|
you also need to update |
maciejdudko
left a comment
There was a problem hiding this comment.
I left a few comments about timeout calculation in test server but otherwise looks good.
| long elapsedSeconds = Timestamps.between(scheduledTime, currentTime).getSeconds(); | ||
| long elapsedMillis = elapsedSeconds * 1000; |
There was a problem hiding this comment.
This rounds off milliseconds from the difference. At the extreme, a sub-second timeout will be rounded to zero.
| long elapsedSeconds = Timestamps.between(scheduledTime, currentTime).getSeconds(); | |
| long elapsedMillis = elapsedSeconds * 1000; | |
| long elapsedMillis = com.google.protobuf.util.Durations.toMillis( | |
| Timestamps.between(scheduledTime, currentTime)); |
| // Calculate minimum of all applicable timeouts | ||
| Long remainingMillis = null; | ||
|
|
||
| if (!isStarted && scheduledEvent.hasScheduleToStartTimeout()) { |
There was a problem hiding this comment.
ScheduleToStartTimeout should not be part of the calculation here.
| com.google.protobuf.util.Durations.toMillis( | ||
| scheduledEvent.getScheduleToCloseTimeout()); | ||
| if (scheduleToCloseMillis > 0) { | ||
| long remaining = scheduleToCloseMillis - elapsedMillis; |
There was a problem hiding this comment.
This code should handle the situation where elapsedMillis is larger than scheduleToCloseMillis for some reason, just for extra safety.
| long remaining = scheduleToCloseMillis - elapsedMillis; | |
| long remaining = Math.max(1, scheduleToCloseMillis - elapsedMillis); |
| long remaining = startToCloseMillis - elapsedMillis; | ||
| remainingMillis = | ||
| (remainingMillis == null) ? remaining : Math.min(remainingMillis, remaining); |
There was a problem hiding this comment.
StartToCloseTimeout should not have elapsed time subtracted.
| long remaining = startToCloseMillis - elapsedMillis; | |
| remainingMillis = | |
| (remainingMillis == null) ? remaining : Math.min(remainingMillis, remaining); | |
| remainingMillis = | |
| (remainingMillis == null) ? startToCloseMillis : Math.min(remainingMillis, startToCloseMillis); |
maciejdudko
left a comment
There was a problem hiding this comment.
There are two tests failing.
io.temporal.workflow.nexus.HeaderTest#testOperationHeaders (test server only): the test expects both request-timeout and operation-timeout headers to be always set. Currently, test server doesn't send operation-timeout if it's not set in options. The real server seems to have a default schedule-to-close timeout of 200 seconds, though I'm not sure where it comes from. The test server should match the real server and have a default timeout too.
io.temporal.testserver.functional.NexusWorkflowTest#testNexusOperationStartToCloseTimeout (real server only): there are two problems. One is that the real server sets error message as "operation timed out" but the test expects "operation timed out after starting". I think we should have a contains-substring check here rather than equality. The second I was unable to reproduce locally but it failed 2/2 times in CI - "OPERATION_TIMEOUT should be <= start-to-close timeout" assertion failed. Might be because of JDK8, or something else.
fa5185d to
1bcbc59
Compare
|
Finally getting back to this.
Yeah, I noticed this, this PR fixes the bug in the Java test server where the operation-timeout would be set when it shouldn't have but looks like it's still not quite mimicking the real server. The "real" server defaults schedule to close timeout to the workflow run timeout, we need to fix this in the Java test server too.
I'm changing the error messages everywhere to |
333a15e to
f568ffd
Compare
.github/workflows/ci.yml
Outdated
| --dynamic-config-value frontend.workerVersioningDataAPIs=true \ | ||
| --dynamic-config-value component.nexusoperations.recordCancelRequestCompletionEvents=true \ | ||
| --dynamic-config-value component.callbacks.allowedAddresses='[{"Pattern":"*","AllowInsecure":true}]' \ | ||
| --dynamic-config-value history.MaxBufferedQueryCount=100000 \ |
There was a problem hiding this comment.
Why are we adding a bunch of (unrelated?) dynamic configs? Is this a merge conflict with us cleaning them up or are they actually needed?
There was a problem hiding this comment.
Looks like a merge conflict that wasn't properly resolved. Thanks.
|
|
||
| @Test | ||
| public void testNexusOperationScheduleToStartTimeout() { | ||
| assumeTrue( |
There was a problem hiding this comment.
Why are we skipping for the real server?
There was a problem hiding this comment.
This is a mistake. Let me remove this.
| attributes.setScheduleToCloseTimeout( | ||
| ProtobufTimeUtils.toProtoDuration(input.getOptions().getScheduleToCloseTimeout())); | ||
| attributes.setScheduleToStartTimeout( | ||
| ProtobufTimeUtils.toProtoDuration(input.getOptions().getScheduleToStartTimeout())); |
There was a problem hiding this comment.
note: ProtobufTimeUtils.toProtoDuration will return a zero duration here not a null duration, sometimes the server treats a zero duration different then a empty duration so we need to be careful that is not the case here.
There was a problem hiding this comment.
Server treats zero and null the same here.
| Assert.assertEquals("nexus operation completed unsuccessfully", failure.getMessage()); | ||
| io.temporal.api.failure.v1.Failure cause = failure.getCause(); | ||
| Assert.assertEquals("deliberate test failure", cause.getMessage()); | ||
| Assert.assertTrue(cause.hasApplicationFailureInfo()); |
There was a problem hiding this comment.
What type of failure info does the cause have now?
|
|
||
| @Test | ||
| public void conflictPolicyUseExisting() { | ||
| org.junit.Assume.assumeTrue( |
There was a problem hiding this comment.
This also shouldn't have been committed.
c54499d to
221ffb3
Compare
What was changed
ScheduleToStartTimeoutandStartToCloseTimeoutfor Nexus operations in the Java SDK and test server.timeoutNexusOperation()pollNexusTaskQueue()and use the correct header formatGRPC_MESSAGE_TOO_LARGEworkflow task failures withFORCE_CLOSE_COMMANDcause, matching real server behaviornexusFailureToAPIFailureno longer wraps non-temporal failures inApplicationFailureInfo, andhandlerErrorToFailureno longer sets message at the top levelRETRY_STATE_TIMEOUTfor schedule-to-start and schedule-to-close timeouts (server fix for temporalio/temporal#3667)TerminatedFailureinstead ofTimeoutFailureApplicationFailureInfowrapper